3. Row-Overflow Pages
While the maximum in-row size is 8,060 bytes per row,
SQL Server 2008 allows actual rows to exceed this size for tables that
contain varchar, nvarchar, varbinary, sql_variant,
or common language runtime (CLR) user-defined type columns. Although
the length of each one of these columns must still fall within the limit
of 8,000 bytes, the combined width of the row can exceed the 8,060-byte
limit.
When a combination of varchar, nvarchar, varbinary, sql_variant,
or CLR user-defined type columns exceeds the 8,060-byte limit, SQL
Server moves the record column with the largest width to another page in
the ROW_OVERFLOW_DATA allocation unit, while maintaining a
24-byte pointer to the row-overflow page on the original page. Moving
large records to another page occurs dynamically as records are
lengthened based on update operations. Update operations that shorten
records may cause records to be moved back to the original page in the IN_ROW_DATA allocation unit.
Row-overflow pages are used only under certain
circumstances. For one, the row itself has to exceed 8,060 bytes; it
does not matter how full the data page itself is. If a row is less than
8,060 bytes and there’s not enough space in the data page, normal page
splitting occurs to store the row. Also, each column in a table must be
completely on the row or completely off it. A variable-length column
cannot have some of its data on the regular data page and some of its
data on the row-overflow page. One row can span multiple row-overflow
pages depending on how many large variable-length columns there are.
Be aware that having data rows that require a
row-overflow page increases the I/O cost of retrieving the data row.
Querying and performing other select operations, such as sorts or joins
on large records that contain row-overflow data, also slow processing
time because these records are processed synchronously instead of
asynchronously.
Therefore, when you design a table with multiple varchar, nvarchar, varbinary, sql_variant,
or CLR user-defined type columns, you might want to consider the
percentage of rows that are likely to require row overflow and the
frequency with which this overflow data is likely to be queried. If
there are likely to be frequent queries on many rows of row-overflow
data, you should consider normalizing the table so that some columns are
moved to another table, reducing the overall row size so that the rows
fit within 8,060 bytes. The data can then be recombined in a query using
an asynchronous JOIN operation.
Tip
Because
of the performance implications, row-overflow pages are intended to be a
solution for situations in which most of your data rows fit completely
on your data pages and you only occasionally have rows that require a
row-overflow page. Row-overflow pages allow SQL Server to handle the
large data rows effectively without requiring a redesign of your table.
However, if you find more than a few of your rows exceed the in-row
size, you probably should look into using the LOB data types or
redesigning your table.
4. LOB Data Pages
If you want to store large amounts of text or binary data, you can use the text, ntext, and image data types, as well as the varchar(max), nvarchar(max), and varbinary(max) data types. Each column for a row of these data
types can store up to 2GB (minus 2 bytes) of data. By default, the LOB
values are not stored as part of the data row, but as a collection of
pages on their own. For each LOB column, the data page contains a
16-byte pointer, which points to the location of the initial page of the
LOB data. A row with several LOB columns has one pointer for each
column.
The pages that hold LOB data are 8KB in size, just
like any other page in SQL Server. An individual LOB page can hold LOB
data for multiple columns and also from multiple rows. A LOB data page
can even contain a mix of LOB data. This helps reduce the storage
requirements for the LOB data, especially when smaller amounts of data
are stored in these columns. For example, if SQL Server could store data
for only a single column for a single row on a single LOB data page and
the data value consisted of only a single character, it would still use
an entire 8KB data page to store that data! Definitely not an efficient
use of space.
A LOB data page can hold LOB data for only a single
table, however. A table with a LOB column has a single set of pages to
hold all its LOB data.
LOB information is presented externally (to the user)
as a long string of bytes. Internally, however, the information is
stored within a set of pages. The pages are not necessarily organized
sequentially but are logically organized as a B-tree structure. If an
operation addresses some information in the middle of the data, SQL
Server can navigate through the B-tree to find the data. In previous
versions, SQL Server had to follow the entire page chain from the
beginning to find the desired information.
If the amount of the data in the LOB field is less
than 32KB, the 16-byte pointer in the data row points to an 84-byte root
structure in the LOB B-tree. This root structure points to the pages
and location where the actual LOB data is stored (see Figure 4).
The data itself can be placed anywhere within the LOB pages for the
table. The root structure keeps track of the location of the information
in a logical manner. If the data is less than 64 bytes, it is stored in
the root structure itself.
If the amount of LOB data exceeds 32KB, SQL Server
allocates intermediate B-tree index nodes that point to the LOB pages.
In this situation, the intermediate node pages are stored on pages not
shared between different occurrences of LOB columns; the intermediate
node pages store nodes for only one LOB column in a single data row.
Storing LOB Data in the Data Row
To further conserve space and help minimize I/O, SQL
Server 2008 supports storing LOB data in the actual data row. When the
LOB data is stored outside the data row pages, at a minimum, SQL Server
needs to perform one additional page read per row to get the LOB data.
Why would you want to store LOB data in the row? Why not just store the data in a varchar(8000)?
Well, primarily because there is an upper limit of 8KB if the data is
stored within the data row (not counting the other columns). Using a LOB
data type, you can store more than 2 billion bytes of text. If you know most of your records will be small, but on occasion, some very large values will be stored, the text in row
option provides optimum performance and better space efficiency for the
majority of your LOB values, while providing the flexibility you need
for the occasional large values. This option also provides the benefit
of keeping the data all in a single column instead of having to split it
across multiple columns or rows when the data exceeds the size limit of
a single row.
If you want to enable the text in row option for a table with a LOB column, use the sp_tableoption stored procedure:
exec sp_tableoption pub_info, 'text in row', 512
This example enables up to 512 bytes of LOB data in the pub_info
table to be stored in the data row. The maximum amount of LOB data that
can be stored in a data row is 7,000 bytes. When a LOB value exceeds
the specified size, rather than store the 16-byte pointer in the data
row as it would normally, SQL Server stores the 24-byte root structure
that contains the pointers to the separate chunks of LOB data for the
row in the LOB column.
The second parameter to sp_tableoption can be just the option ON. If no size is specified, the option is enabled with a default size of 256 bytes. To disable the text in row option, you can set its value to 0 or OFF with sp_tableoption.
When the option is turned off, all LOB data stored in the row is moved
off to LOB pages and replaced with the standard 16-byte pointer. This
can be a time-consuming process for a large table.
Also, you should keep in mind that just because this
option is enabled it doesn’t always mean that the LOB data will be
stored in the row. All other data columns that are not LOB take priority
over LOB data for storage in the data row. If a variable-length column
grows and there is not enough space left in the row or page for the LOB
data, the LOB data is moved off the page.
Storage of MAX Data
An alternative to the text and image data types in SQL Server 2008 is the option of defining variable-length data using the MAX specifier. When you use the MAX specifier with varchar, nvarchar, and varbinary columns, SQL Server determines automatically whether to store the data as a regular varchar, nvarchar, or varbinary
value or as a LOB. Essentially, if the actual length is less than 8,000
bytes, SQL Server treats it as if it were one of the regular
variable-length data types, including using row-overflow pages if
necessary. If the MAX column exceeds 8,000 bytes, it is stored like LOB data.
5. Index Pages
Index information is stored on index pages. An index page
has the same layout as a data page. The difference is the type of
information stored on the page. Generally, a row in an index page
contains the index key and a pointer to the page or row at the next
(lower) level.
The actual information stored in an index page
depends on the index type and whether it is a leaf-level page. A
leaf-level clustered index page is the data page itself; you’ve already
seen its structure. The information stored on other index pages is as
follows:
Clustered indexes, nonleaf pages— Each index row contains the index key and a pointer (the fileId and a page address) to a page in the index tree at the next lower level.
Nonclustered index, nonleaf pages— Each index row contains the index key and a page-down pointer (the file ID and a page address) to a page in the index tree at the
next lower level. For nonunique indexes, the nonleaf row also contains
the row locator information for the corresponding data row.
Nonclustered index, leaf pages—
Rows on this level contain an index key and a reference to a data row.
For heap tables, this is the Row ID; for clustered tables, this is the
clustered key for the corresponding data row.